Pruned Random Subspace Method for One-Class Classifiers

نویسندگان

  • Veronika Cheplygina
  • David M. J. Tax
چکیده

The goal of one-class classification is to distinguish the target class from all the other classes using only training data from the target class. Because it is difficult for a single one-class classifier to capture all the characteristics of the target class, combining several one-class classifiers may be required. Previous research has shown that the Random Subspace Method (RSM), in which classifiers are trained on different subsets of the feature space, can be effective for one-class classifiers. In this paper we show that the performance by the RSM can be noisy, and that pruning inaccurate classifiers from the ensemble can be more effective than using all available classifiers. We propose to apply pruning to RSM of one-class classifiers using a supervised AUC criterion or an unsupervised consistency criterion. It appears that when the AUC criterion is used, the performance may be increased dramatically, while for the consistency criterion results do not improve, but only become more predictable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Subspace Method with Feature Subsets Selected by a Fuzzy Class Separability Index

Classifier combining techniques have become popular for improving weak classifiers in recent years. The random subspace method (RSM) is an efficient classifier combining technique that can improve classification performance of weak classifiers for the small sample size (SSS) problems. In RSM, the feature subsets are randomly selected and the resulting datasets are used to train classifiers. How...

متن کامل

A Genetic Algorithm-Based Heterogeneous Random Subspace Ensemble Model for Bankruptcy Prediction

Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble techniques are known to be very useful in improving the generalization ability of a classifier. The random subspace ensemble technique is a simple but effective method of constructing ensemble classifiers, in which some features are randomly d...

متن کامل

A Novel Random Subspace Method for Online Writeprint Identification

With the widespread application of computer network technology, diverse anonymous cyber crimes begin to appear in the online community. The anonymous nature of online-information distribution makes writeprint identification a critical forensic problem. But the difficulty of the task is the huge number of features in even a moderate-sized available text corpus, which causes the problem of over-t...

متن کامل

ForesTexter: An efficient random forest algorithm for imbalanced text categorization

In this paper, we propose a new Random Forest (RF) based ensemble method, ForesTexter, to solve the imbalanced text categorization problems. RF has shown great success in many real-world applications. However, the problem of learning from text data with class imbalance is a relatively new challenge that needs to be addressed. A RF algorithm tends to use a simple random sampling of features in b...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011